Statistical Strategies for Pruning All the Uninteresting Association Rules
نویسنده
چکیده
We propose a general framework to formalize the problem of capturing the intensity of implication for association rules through statistical metrics. In this framework we present properties that influence the interestingness of a rule, analyze the conditions that lead a measure to perform a perfect prune at a time, and define a final proper order to sort the surviving rules. We will discuss why none of the currently employed measures can capture objective interestingness, and just the combination of some of them in a multi-step fashion, can be reliable. In contrast, we propose a new simple modification of the Pearson coefficient that will meet all the necessary requirements. We statistically infer the convenient cut-off threshold for this new metric by empirically describing its distribution function through simulation. Experiments show a promising behaviour of our proposal. 1 PROBLEM FORMULATION One of the most relevant tasks in Knowledge Discovery in Databases is mining for association rules in large masses of data, as it was first formulated by [1]. This task is often decomposed into two separate phases: 1/ Finding all the frequent itemsets having support over a user-specified threshold, and, 2/ Generating the association rules from the maximal discovered frequent itemsets. The input of a frequent sets algorithm is a database , composed of a collection of transactions, where each transaction is a subset of a given fixed set of items . Let be an itemset, and let ! "# $ &% be the ratio of the number of transactions in which appears to the number of all transactions in , i.e. ! "# $ '%( *) +-, . /-02143 576 8 5 8 . We note the support of an itemset as ! "# $ '% . An itemset is called frequent if its support exceeds a given user-specified threshold, 9 . In the second phase, association rules are constructed from those maximal frequent sets. In brief, given any maximal frequent itemset : , an association rule is an expression
منابع مشابه
Interestingness and Pruning of Mined Patterns
We study the following question: when can a mined pattern, which may be an association, a correlation, ratio rule, or any other, be regarded as interesting? Previous approaches to answering this question have been largely numeric. Speciically, we show that the presence of some rules may make others redundant, and therefore uninteresting. We articulate these principles and formalize them in the ...
متن کاملOn Optimal Rule Mining: A Framework and a Necessary and Sufficient Condition of Antimonotonicity
Many studies have shown the limits of support/confidence framework used in Apriori-like algorithms to mine association rules. There are a lot of efficient implementations based on the antimonotony property of the support but candidate set generation is still costly. In addition many rules are uninteresting or redundant and one can miss interesting rules like nuggets. One solution is to get rid ...
متن کاملDirect Interesting Rule Generation
An association rule generation algorithm usually generates too many rules including a lot of uninteresting ones. Many interestingness criteria are proposed to prune those uninteresting rules. However, they work in post-pruning process and hence do not improve the rule generation ef£ciency. In this paper, we discuss properties of informative rule set and conclude that the informative rule set in...
متن کاملAn Association Rules Survey for Redundancy Reduction and Desired Rules with Ontology
In Data Mining generating an association rules is still an important research issue, the usefulness of association rules is strongly limited by the huge amount of delivered rules. To overcome this drawback, several methods were proposed for the reducing the redundant rules and uninteresting patterns. However, being generally based on statistical information, most of these methods do not guarant...
متن کاملOn pruning strategies for discovery of generalized and quantitative association rules
Mining association rules has become an important datamining task, and meanwhile many algorithms have been developed which often differ in several aspects. In this paper, we analyse and compare the pruning strategies of several algorithms that were designed for mining generalised and quantitative association rules while abstracting from other technical details. Furthermore, we sketch a novel pru...
متن کامل